Fire up GraphLab create


In [2]:
import graphlab

Load a tabular dataset


In [3]:
sf = graphlab.SFrame('')


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1474175251.log
This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on September 18, 2017.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-176c2dd7d69f> in <module>()
----> 1 sf = graphlab.SFrame('')

/Users/sud/anaconda3/envs/gl-env/lib/python2.7/site-packages/graphlab/data_structures/sframe.pyc in __init__(self, data, format, _proxy)
    951                     pass
    952                 else:
--> 953                     raise ValueError('Unknown input type: ' + format)
    954 
    955         sframe_size = -1

/Users/sud/anaconda3/envs/gl-env/lib/python2.7/site-packages/graphlab/cython/context.pyc in __exit__(self, exc_type, exc_value, traceback)
     47             if not self.show_cython_trace:
     48                 # To hide cython trace, we re-raise from here
---> 49                 raise exc_type(exc_value)
     50             else:
     51                 # To show the full trace, we do nothing and let exception propagate

ValueError: Invalid url: 

In [1]:
sf = graphlab.SFrame('people-example.csv')


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-4df0be298ea8> in <module>()
----> 1 sf = graphlab.SFrame('people-example.csv')

NameError: name 'graphlab' is not defined

In [2]:
import graphlab

In [3]:
sf = graphlab.SFrame('people-example.csv')


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1474175545.log
This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on September 18, 2017.
Finished parsing file /Users/sud/Documents/Coursera/coursera/ml-foundations/week-1/people-example.csv
Parsing completed. Parsed 7 lines in 0.028841 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /Users/sud/Documents/Coursera/coursera/ml-foundations/week-1/people-example.csv
Parsing completed. Parsed 7 lines in 0.010787 secs.

SFrame basics


In [4]:
sf # we can view first few lines of the table


Out[4]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [5]:
sf.head()


Out[5]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [6]:
sf.tail()


Out[6]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

GraphLab canvas


In [7]:
sf.show()


Canvas is accessible via web browser at the URL: http://localhost:56288/index.html
Opening Canvas in default web browser.

In [8]:
graphlab.canvas.set_target('ipynb')

In [9]:
sf['age'].show(view='Categorical')


Inspect Dataset


In [10]:
sf['Country']


Out[10]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [11]:
sf['age'].mean()


Out[11]:
23.142857142857146

Creating new columns


In [12]:
sf


Out[12]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [13]:
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']

In [14]:
sf


Out[14]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

Apply Function for Data transformation


In [15]:
sf['Country']


Out[15]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [16]:
sf['Country'].show()



In [18]:
def transform_country(country):
    return 'United States' if country == 'USA' else country

In [20]:
transform_country('USA')


Out[20]:
'United States'

In [21]:
transform_country('India')


Out[21]:
'India'

In [24]:
sf['Country'] = sf['Country'].apply(transform_country)

In [23]:
sf


Out[23]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [25]:
sf


Out[25]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown United States 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [26]:
sf['Country'].show()



In [ ]: